Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

نویسنده

  • Bruno Scherrer
چکیده

Given a Markov Decision Process (MDP) with n states and m actions perstate, we study the number of iterations needed by Policy Iteration (PI)algorithms to converge to the optimal γ-discounted optimal policy. We con-sider two variations of PI: Howard’s PI that changes the actions in all stateswith a positive advantage, and Simplex-PI that only changes the action inthe state with maximal advantage. We show that Howard’s PI terminatesafter at most n(m − 1)⌈

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

UPPER BOUNDS FOR FINITENESS OF GENERALIZED LOCAL COHOMOLOGY MODULES

Let $R$ be a commutative Noetherian ring with non-zero identity and $fa$ an ideal of $R$. Let $M$ be a finite $R$--module of finite projective dimension and $N$ an arbitrary finite $R$--module. We characterize the membership of the generalized local cohomology modules $lc^{i}_{fa}(M,N)$ in certain Serre subcategories of the category of modules from upper bounds. We define and study the properti...

متن کامل

Improved Strong Worst-case Upper Bounds for MDP Planning

The Markov Decision Problem (MDP) plays a central role in AI as an abstraction of sequential decision making. We contribute to the theoretical analysis of MDP planning, which is the problem of computing an optimal policy for a given MDP. Specifically, we furnish improved strong worstcase upper bounds on the running time of MDP planning. Strong bounds are those that depend only on the number of ...

متن کامل

An improved infeasible‎ ‎interior-point method for symmetric cone linear complementarity‎ ‎problem

We present an improved version of a full Nesterov-Todd step infeasible interior-point method for linear complementarityproblem over symmetric cone (Bull. Iranian Math. Soc., 40(3), 541-564, (2014)). In the earlier version, each iteration consisted of one so-called feasibility step and a few -at most three - centering steps. Here, each iteration consists of only a feasibility step. Thus, the new...

متن کامل

[hal-00829532, v3] Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

Given a Markov Decision Process (MDP) with n states and m actions per state, we study the number of iterations needed by Policy Iteration (PI) algorithms to converge to the optimal γ-discounted optimal policy. We consider two variations of PI: Howard’s PI that changes the actions in all states with a positive advantage, and Simplex-PI that only changes the action in the state with maximal advan...

متن کامل

On the Complexity of Policy Iteration

Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MD Ps). Pol­ icy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Math. Oper. Res.

دوره 41  شماره 

صفحات  -

تاریخ انتشار 2013